2022-09-16

Members

Student Performance DataSet

  • The data is about student academic grades for math course of two Portuguese secondary public school during the 2005/2006 school year.
  • The dataset have 395 records in total with 5 numerical variables and 28 categorical variables.
    • The numerical variables include age, number of school absences, 1st Period grade(G1), 2nd period grade(G2) and final period grade(G3, target variable).
    • Since we have too many categorical variables, for our data analysis, we will only include school, sex and number of past failures.

First View of Data

Barplot for Categorical Variables

Barplot for Categorical Variables

  • There are more students from Gabriel Pereira school than Mousinho da Silveira school.
  • There are slightly higher number of female students compared to male students.
  • Most of the students never failed the other courses before.
Barplot for Numerical Variables

Barplot for Numerical Variables

  • Red dashed line as mean and a fitted density in purple.
  • Mean: Age 17, Number of absences 6, G1 grade 11, G2 grade 11, G3 grade 10
  • Variance: Age 1.63, Number of absences 64.05, G1 grade 11.02, G2 grade 14.15, G3 grade 20.99

Multivariate Normal Distribution by splitting data by Sex and Failures

Female and has failed the other course

\(\hat{\beta}\) \(\kappa\) p-val
Skewness 15.87 105.8 4.901e-09
Kurtosis 41.48 2.448 0.01437

Male and has failed the other course

\(\hat{\beta}\) \(\kappa\) p-val
Skewness 10.48 75.14 9.406e-05
Kurtosis 35.42 0.1643 0.8695

Female and has NEVER failed the other course

\(\hat{\beta}\) \(\kappa\) p-val
Skewness 40.14 1124 0
Kurtosis 76.29 31.98 0

Male and has NEVER failed the other course

\(\hat{\beta}\) \(\kappa\) p-val
Skewness 24.02 576.5 0
Kurtosis 66.68 22.72 0
  • For normal distribution, kurtosis value is approximately equal to 3, skewness is equal to 0 or approximately close to 0.

Future Analysis

  • PCA. Exploring our dataset we found that the first period, second period and final grade (G1, G2, G3) show strong positive correlation with each other. This is our initial motivation for a future PCA analysis to understand these relationships in the dataset further.
  • Classification. Using what we have in our dataset to predict if a student is going to fail or pass math course. G3 - final grade can be used as class label, since total mark is 20, we can label student below 10 as fail, above or equal to 10 as pass.

Thank you